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Abstract 

We consider differentially private approximate singular vector computation. Known 
worst-case lower bounds show that the error of any differentially private algorithm must 
scale polynomially with the dimension of the singular vector. We are able to replace this 
dependence on the dimension by a natural parameter known as the coherence of the matrix 
that is often observed to be significantly smaller than the dimension both theoretically 
and empirically. We also prove a matching lower bound showing that our guarantee is 
nearly optimal for every setting of the coherence parameter. Notably, we achieve our 
bounds by giving a robust analysis of the well-known power iteration algorithm, which 
may be of independent interest. Our algorithm also leads to improvements in worst-case 
settings and to better low-rank approximations in the spectral norm. 
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1 Introduction 



Spectral analysis of graphs and matrices is one of the most fundamental tools in data mining. 
The singular vectors of data matrices are used for spectral clustering, principal component 
analysis, latent semantic indexing, manifold learning, multi-dimensional scaling, low rank 
matrix approximation, collaborative filtering, and matrix completion. They provide a means 
of avoiding the curse of dimensionality by discovering an (approximate) low-dimensional 
representation of seemingly very high dimensional data. Unfortunately, many of the datasets 
for which spectral methods are ideal are composed of sensitive user information: browsing 
histories, friendship networks, movie reviews, and other data collected from private user 
interactions. The Netflix prize dataset is a perfect example of this phenomenon: a dataset 
of supposedly "anonymized" user records was released for the Netflix Prize Challenge, 
which was a matrix of user/movie review pairs. The goal of the competition was to predict 
user /movie review pairs missing from the matrix. Unfortunately, the ad-hoc anonymization 
of this dataset proved to be insufficient, and Narayanan and Shmatikov [NS08] were able to 
re-identify many of the users. Because of the privacy concerns that the attack brought to 
light, the second proposed Netflix challenge was canceled. 

In the past decade, a rigorous formulation of privacy known as differential privacy has 
been developed, along with a collection of powerful theoretical results. With very few 
exceptions, existing algorithms come with utility guarantees that hold in the worst case over 
the choice of the private data. As a result, these utility bounds can sometimes be too weak to 
be meaningful on particular data sets of interest. 

Several algorithms are known for computing approximate top singular vectors of a matrix 
under differential privacy. In fact, nearly optimal error bounds are known in the worst case. 
Unfortunately, differential privacy unavoidably forces these bounds to degrade with the 
dimension of the data. More concretely, given an n x n matrix A, any differentially private 
algorithm must in the worst case output a vector x such that ||Ax|| 2 ^ o\(A) - 0(^jn), where 
Oi(A) denotes the top singular value of A. If the matrix A has bounded entries and is sparse 
as is very common, the dependence on n in the error term can easily overwhelm the signal. 
This dependence on n is discouraging, because one of the most compelling goals of tools 
such as PCA is to overcome the "curse of dimensionality" inherent in the analysis of very 
high dimensional data. We therefore ask the question: Can we hope to achieve a nearly 
dimension-free bound under a reasonable assumption on the input matrix? 

We answer this question in the affirmative. Specifically, we give an algorithm to compute 
an approximate singular vector that achieves error 0(^J fi(A)\og(n)). Here, ^i(A) denotes the 
coherence of the input matrix. The coherence varies between 1 and n. We say that A has 
low coherence if fi(A) is significantly smaller than n. Roughly, a matrix has low coherence if 
none of its singular vectors have any large coordinates. Low coherence is a widely observed 
property of large matrices. Random models exhibit low coherence as well as many real-world 
matrices. Indeed, many recent results in matrix completion, Robust Principal Component 
Analysis and Low-rank approximation rely crucially on the assumption that the input matrix 
has low coherence. The error of our algorithm depends essentially only on the square root 
of the coherence of the data matrix. Moreover, we show that the exact dependence on the 
coherence that we achieve is best possible: Specifically, for each value of the coherence 
parameter, we give a family of matrices for which no differentially private algorithm can get 
a better approximation to the top singular vector than our algorithm does, up to logarithmic 
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factors. 

Our algorithm is also highly efficient and can be implemented using a nearly linear 
number of vector inner product computations. In particular, our running time is nearly 
linear in the number of nonzeros of the matrix. In fact, our algorithm is a new variant of the 
classical power iteration method that has long been the basis of many practical eigenvalue 
solvers. 



1.1 Our Results 

We say that a matrix A e ]R mx " with singular value decomposition A = WLV* has coherence 

^(A) d ^{m||[/||Ln||V|iy. 

For now we assume that m-n, but all of our results apply to general matrices. Note that 
H(A) € [l,n). We give a simple (e, <5)-differentially private algorithm which achieves the 
following guarantee. 

Theorem 1.1 (Informal, some parameters hidden). For any matrix A that satisfies a mild 
assumption on the decay of its singular values, Private Power Iteration returns a vector x such that 
■with high probability 

\\Ax\\ 



a x (A) - off- 1 ^(A)log(l/6)log n ) 
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We also show a nearly matching lower bound: 

Theorem 1.2 (Informal). For any coherence parameter c e {2,...,n}, there exists a family of 
matrices A such that for each AgA, }i(A) - c, and such that for every (e, b)- differentially private 
algorithm M with 5 = Q(l/n) there is a matrix A e A so that with high probability, M(A) outputs 
a vector x such that 

\\Ax\\ 2 



^(AJ-nie- 1 ^)] 



\\X\\2 

Note that in addition to showing that our dependence on fi(A) is tight, this theorem shows 
that the error of any data-independent guarantee must be at least Q (e 1 V«)- 

Finally we show how our algorithm can be used to compute accurate rank fc-approximations 
to the private matrix A in the spectral norm, for any k. For k = 1, the quality of our approxi- 
mation is optimal. For k ^ 2, as in previous work [HR12], our bounds depend on r, where r 
is the rank of A. Note that these bounds still improve on the best worst-case bounds when A 
is low rank. 

Theorem 1.3 (Informal, some parameters hidden). There is an (e, b)-differentially private 
algorithm such that for any matrix A that satisfies a mild assumption on the decay of its singular 
values, it returns a rank-l matrix A\ such that with high probability 

\\A-A X \\ 2 < a 2 (A) + o( £ - 1 A /^)log(l/<5)log i 

Moreover, there is an (e, b)-differentially private algorithm such that for any rank r matrix A that 
satisfies a mild assumption on the decay of its singular values, it returns a rank-k matrix A^ such 
that with high probability: 

\\A-A k \\ 2 < a Jt+1 (A) + o( £ - 1 /c 2 A /(r^(A) + /clogn)log(l/ ( 3)logn) 
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1.2 More efficient and improved worst-case bounds 

Our robust power iteration analysis can also be applied easily to worst-case settings without 
any incoherence assumptions. For example, we resolve multiple questions asked by Kapralov 
and Talwar [KT13]. Specifically we improve the running time of their algorithm by large 
polynomial factors, give a much simpler algorithm and improve the error dependence on k. 
In the main body of the paper we study differential privacy under changes of single entries. 
Here, we consider unit changes in spectral norm as proposed by [KT13]. Our algorithm 
easily adapts to this definition and gives the following corollary. 

Corollary 1.4. There is an algorithm such that for every matrix A that satisfies a mild assumption 
on the decay of its singular values, it returns a rank-k matrix A* such that with high probability, 

l|A-A fc || 2 ^a fc+1 (A) + o(£" 1 /c 2 V"log(l/<5)log«). (1) 

Moreover, the algorithm satisfies (e, b)-differential privacy under unit spectral perturbations. For 
(e,0)-differential privacy the error bound satisfies 

\\A-A k \\ 2 ^a k+1 (A) + o(e- 1 k 2 n\ogn). 

We stress that Equation 1 is the first bound for (e, ^-differential privacy under unit 
spectral norm perturbations. The dependence on n matches the error achieved by randomized 
response for single entry changes. 

1.3 Our Techniques 

Our main technical contribution includes a novel "robust" analysis of the classical power 
iteration algorithm for computing the top eigenvector of a matrix, which may be of inde- 
pendent interest. Specifically, we analyze power iteration in which an arbitrary sequence of 
perturbations g\,...,gt maybe added to the matrix vector products at each round l,...,T. We 
give simple conditions on the perturbation vectors g\,...,gt such that under these conditions, 
perturbed powering of a matrix A e IR f!X " for 0(log«) rounds results in a vector x such that: 
||Ax||/||x|| ^ (1 -ji)oi(A) where <Ji(A) is the top singular value of A. Using this general analysis, 
we are then free to choose the perturbations appropriately to guarantee differential privacy. 
The accuracy bounds we obtain are a function of the scale of the noise that is necessary for 
privacy. 

It is immediate that the magnitude of the perturbation that must be used to guarantee 
differential privacy (of the matrix) when computing a matrix vector product is proportional 
to the magnitude of the largest coordinate in the vector. To prove our accuracy guarantees, 
therefore, it suffices to bound the maximum magnitude of any coefficient of any of the 
vectors Xi,...,x T that emerge during the steps of power iteration. Of course, if the matrix is 
incoherent, then each x t can be written as a linear combination of basis vectors that each have 
small coordinates x t - &i v i- Unfortunately this does not suffice to guarantee that x t will 
have small coordinates without incurring a blow-up that depends on the number of nonzero 
coefficients. However, we show that at each round, sign(ai), ...,sign(a„) are independent, 
unbiased {-1, 1} random variables. This, together with the incoherence assumption, is enough 
to complete the analysis. 
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Finding a unit vector x such that ||Ax|| > (1 - fi)oi{A) is sufficient to compute an accurate 
rank-1 approximation to A in spectral norm. If x was exactly equal to the top singular vector 
of A, we could then recurse, and compute the top singular vector of A' = A - oiXX T , from 
which we could compute an optimal rank 2 approximation to A. Unfortunately, x is only an 
approximation to the top singular vector. Therefore, in order to be able to usefully recurse 
on A' = A - 5\xx T , we require two conditions: (1) That \\A'\\ 2 ~ 02(A), and (2) that A' is nearly 
as incoherent as A. Condition (1) has already been shown by Kapralov and Talwar [KT13]. 
Therefore, it remains for us to show condition (2). We show that indeed the incoherence 
of the matrix cannot increase by more than a factor of yjr, where r is the rank of A, during 
any number of "deflation" steps. However, we do not know whether this factor of ^[r is 
necessary, or is merely an artifact of our analysis. We leave removing this factor of yjr from 
our approximation factor for computing rank-fc approximations when k > 2 as an intriguing 
open problem. 

Finally, we give a pointwise lower bound that shows that (up to log factors), our algorithm 
for privately computing singular vectors is tight for every setting of the coherence parameter. We 
do this by reducing to reconstruction lower bounds of Dinur and Nissim [DN03]. Specifically, 
we show, for every coherence parameter C, how to construct a matrix with coherence C from 
some private bit-valued database D such that improving on the performance of our algorithm 
would imply that an adversary would be able to reconstruct D. Since reconstruction attacks 
are precluded by reasonable values of e and b, a lower bound for all (e, 6) private algorithms 
follows. 

1.4 Related Work 

There is by now an extensive literature on a wide variety of differentially private computa- 
tions, which we do not attempt to survey here. Instead we focus on only the most relevant 
recent work. 

There are several papers that consider the problem of privately approximating the 
singular vectors of a matrix without any assumptions on the data. Blum et al. [BDMN05] 
first studied this problem, and gave a simple "input perturbation" algorithm based on 
adding noise directly to the covariance matrix. Chaudhuri et al [CSS 12] and Kapralov and 
Talwar [KT13] give matching worst-case upper and lower bounds for privately computing 
the top eigenvector of a matrix under the constraint of (e, 0)-differential privacy: They 
achieve additive error 0(n/e). Both algorithms involve sampling a singular vector from the 
exponential mechanism. [KT13] also give a polynomial time algorithm for performing this 
sampling from the exponential mechanism, whereas [CSS 12] give a heuristic, but practical 
implementation using Markov-Chain Monte-Carlo. Our algorithm matches these worst case 
bounds, and also gives worst case bounds for (e, <5)-privacy, with error 0(^fn/e). In the event 
that the matrix has low coherence, we improve substantially over the worst case bounds. 
Moreover, we give the first analysis of a natural, efficient algorithm for this problem. Indeed, 
our algorithm is simply a variant on the classic power iteration method, and runs in time 
nearly linear in the input sparsity. 

Low coherence conditions have been recently studied in a number of papers for a number 
of matrix problems, and is a commonly satisfied condition on matrices. Recently, Candes 
and Recht [CR09] and Candes and Tao [CT10] considered the problem of matrix completion. 
Accurate matrix completion is impossible for arbitrary matrices, but [CR09, CT10] show 
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the remarkable result that it is possible under low coherence assumptions. Candes and Tao 
[CT10] also show that almost every matrix satisfies a low coherence condition, in the sense 
that randomly generated matrices will be low coherence with extremely high probability. 

Talwalkar and Rostamizadeh recently used low-coherence assumptions for the problem 
of (non-private) low-rank matrix approximation [TRIO]. They showed that under low- 
coherence assumptions similar to those of [CR09, CT10], the spectrum of a matrix is in 
fact well approximated by a small number of randomly sampled columns, and give formal 
guarantees on the approximation quality of the sampling based Nystrom method of low-rank 
matrix approximation. 

Most related to this paper is Hardt and Roth [HR12], which gives an algorithm for giving 
a rank-fc approximation to a private matrix A in the Frobenius norm, where the approximation 
quality also depends on a (slightly different) notion of matrix coherence. This work differs 
from [HR12] in several respects. First, a matrix may not have any good approximation in 
the Frobenius norm (and hence the bounds of [HR12] might be vacuous), but still might 
have an excellent approximation in the spectral norm. Second, [HR12] does not give any 
means to actually compute the top singular vector of the private matrix, and hence cannot 
be easily used for applications (such as PCA, or spectral clustering) that require direct 
access to the singular vector itself. Moreover, unlike in this paper, [HR12] do not show that 
their dependence on the coherence is tight — only that their guarantees surpass any data- 
independent worst case guarantees. The bounds of [HR12] also incur a constant multiplicative 
error, in addition to an additive error. In this paper, we are able to avoid any multiplicative 
error. Finally, the bounds of [HR1 2] depend on the rank of the private matrix A, a dependence 
that we are able to remove when computing the top singular vector of A, as well as a rank 1 
approximation of A. 

Related to the problem of approximating the spectrum of a matrix is the problem of 
approximating cuts in a graph. This problem was first considered by Gupta, Roth, and 
Ullman [GRU12] who gave methods for efficiently releasing synthetic data for graph cuts 
with additive error 0(n 15 ). Blocki et al [BBDS12] gave a method which achieves improved 
error for small cuts, but does not improve the worst-case error. Improving these bounds to the 
information theoretically optimal bound of O(nlogn) via an efficient algorithm remains an 
interesting open question. Note that smaller error is efficiently achievable for a polynomial 
number of cut queries, using private multiplicative weights [HR10] or randomized response. 
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2 Preliminaries 

We view our dataset as a real valued matrix A e ]R mx ". 

Definition 2.1. We say that two matrices A, A' e M mxn are neighboring if A- A' = ae s ej where 
e s , e t are two standard basis vectors and a € [-1, 1]. In other words A and A' differ in precisely 
one entry by at most 1 in absolute value. 

We use the by now standard privacy solution concept of differential privacy: 
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Definition 2.2. An algorithm M: ]R mx " —> R (where R is some arbitrary abstract range) is 
(e,S)- differentially private if for all pairs of neighboring databases A, A' e lR rax,! , and for all 
subsets of the range ScRwe have P{M(A) e S} < exp(£)P{M(A') e S} + b. 

We make use of the following useful facts about differential privacy. 

Fact 2.3. If M : ]R mx " — » _R is {e,b)- differentially private, and M' : 7? — > i?' is an arbitrary 
randomized algorithm mapping R to R', then M'(M(-)) : ]R mx " — » R' is {e, b)- differentially private. 

The following useful theorem of Dwork, Rothblum, and Vadhan tells us how differential 
privacy guarantees compose. 

Theorem 2.4 (Composition [DRV10]). Let e,b € (0,1),<5' > 0. If M lf ...,Mk are each (^^)- 
differentially private algorithms, then the algorithm M(A) = (Mi(A), . . . ,M/ C (A)) releasing the 
concatenation of the results of each algorithm is (ke,kb)-differentially private. It is also (e',kb + b')~ 
differentially private for e' < ^2k\n{\/b')e + 2ke 2 . 

We denote the 1 -dimensional Gaussian distribution of mean p. and variance a 2 by 
N(p,a 2 ). We use N(fi,a 2 ) d to denote the distribution over d-dimensional vectors with i.i.d. 
coordinates sampled from N(p,o 2 ). We write X ~ D to indicate that a variable X is dis- 
tributed according to a distribution D. We note the following useful fact about the Gaussian 
distribution. 

Fact 2.5. If gi - N{ Hi , a 2 ), then Zgi - N (Li hi, Li of) . 

The following theorem is well known folklore. 

Theorem 2.6 (Gaussian Mechanism). Let e > 0, b 6 (0, 1/2). Let u,v elR d be any two vectors 
such that \\u -v\\ 2 < c. Put a - Ace^ 1 ^\og(2/b). Then, for every measurable set A c M d and 
g ~ N(0,a 2 ) d , we have exp(-£)P{v + g e A} - b ^ F{u +g e A} < exp(e)W{v + g e A) + b. 



Vector and matrix norms. We denote by || • |L the ^-norm of a vector and sometimes use 
|| • || as a shorthand for the Euclidean norm. Given a real mxn matrix A, we will work with 
the spectral norm \\A\\ 2 and the Frobenius norm ||A|| f defined as 



llAlb^maxHAxll and ||A|| F ^ Ya 2 (2) 

IWI=1 \if 

For any mxn matrix A of rank r we have ||A||2 < ||A||p < yjr ■ \\A\\ 2 ■ 



Singular Value Decomposition. Given a matrix A € ]R mx ". The right singular vectors of A 
are the eigenvectors of A A. The left singular vectors of A are the eigenvectors of AA . The 
singular values of A are denoted by cr ; (A) and defined as the square root of the z'-th eigenvalue 
of A T A. The singular value decomposition is any decomposition of A satisfying A = WLV 
where U e lR mxm , V e ]R' !X ' ! are unitary matrices and E € ]R mx,! satisfies E !; = a ; (A) and E ; - ; = 
for i ± j. The colums of U are the left singular vectors of A and the columns of V are the right 
singular vectors of A. 
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2.1 Matrix coherence 

We will work with the following standard notion of coherence throughout the paper. 

Definition 2.7 (^-Coherence). Let A e IR mx " with m < n be a symmetric real matrix with a 
given singular value decomposition A - WLV 1 . We define the fi-coherence of A with respect 
to U and V as 

MA) d ^max{m||[/||L«l|V||y. 

Note that 1 < fi(A) «S n. 

We remark that the coherence of A is defined with respect to a particular singular value 
decomposition since the SVD is in general not unique. 



2.2 Reduction to symmetric matrices 

Throughout our work we will restrict our attention real symmetric nxn matrices. All of 
our results apply however, more generally to asymmetric matrices. Indeed, given A e JR mx " 
with SVD A = YJi=\ OiUiV? and rank r, we can instead consider the symmetric (m + n) x (m + n) 
matrix 



The next fact summarizes all properties of B that we will need. 

Fact 2.8. The matrix B has the following properties: B has a rank 2r and singular values O\,...,0 r 
each occuring with multiplicity two. The singular vectors corresponding to a singular value a 
are spanned by the vectors {(m;,0), (0,v,-): Oj = o}. An entry change in A corresponds to two entry 
changes in B. Furthermore, n(B) = fi(A). 

In particular, this fact implies that an algorithm to find the singular vectors of B will 
also recover the singular vectors of A up to small loss in the parameters. Moreover, an 
algorithm that achieves (£/2,<5/2)-differential privacy on B is also (e, ^-differentially private 
with respect to A. 



3 Robust convergence of power iteration 

In this section we analyze a generic variant of power iteration in which a perturbation is 
added to the computation at each step. The noise vector can be chosen adaptively and 
adversarially in each round. We will derive general conditions under which power iteration 
converges. 

Lemma 3.1 (Robust Convergence). Let Abe a matrix such that o~i + i(A) < (1 - y)aj c (A)for some 
k<n and y > 0. Let U be the space spanned by the top k singular vector of A, let V be the space 
spanned by the last n-k singular vectors. Further assume that there are numbers A, Ajj, Ay > 
such that the following conditions are met: 

1. For all t, \\g t \\ < A, \\P ugt \\ < A^ and \\P vgt \\ ^ Ay. 
2 - > ^ and llJVxoll > £fa 
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Input: Matrix A e JR" xn , number of iterations T € IN, parameter /S 6 (0, 1), 

1. Let x be unit vector. 

2. For t = 1 to T: 

(a) Let g t be an arbitrary perturbation. 

(b) Let x't^Ax^i+gt, 

(c) If \\x' t \\ ^ (1 - fi)oi, then terminate and output x t _\. 

(d) Otherwise let x t = pjjp and continue. 

Output: Vector x T e W unless the algorithm terminated previously. 



Figure 1: Power iteration with adversarial noise 

3. o k (A) ^ 9A/fiy,for some < jS < 1. 
Then, for T = 41og(o>(A)), the algorithm outputs a vector x e W such that 

\\Ax\\ 



(l-p)o k (A). 



Proof. Put a - a k (A) and note that by assumption o> +1 = (1 - y)a k for some y > 0. We will 
consider the potential function 

l|JV*tll 



\\PuXtW 

Suppose that in some round t, we have 

8 A 8 A 

or\\Pvx t -i\\> — - and o'||P[/X t _ 1 || ^ — -. (3) 

y ' y 

We note that by our assumption on the matrix, these conditions are met in the first round 
t — 1 as a consequence of Item 2. Let us derive an expression for the potential drop in round 
t under the above assumption. We have, using Item 1, 

\\P V X t \\ = ||iV(As t _ 1+ g t )|| < IliVA^^I + HJVgtll < (l-y)g||iVx t _ 1 || + Ay 

\\PuXtW \\Pu(Axt-i+gt)\\"\\PuAxt-i\\-\\Pugt\\" cr\\PuXt-i\\-Au 
By the assumption in Equation 3, we have 

(l-y)a\\P v x t _ l \\ + A v (l-7y/S)a\\P v x t _ 1 \\ I L Y\ w 

aWPuX^W-Au " (l-y/SMPvx^W "I 2j||JV*,-ill ' 2) ^ 

We furthermore claim that if the conditions in Equation 3 hold true in round t, then we must 
have ||P[/ x fll > H^LT^t-ill- This follows from our previous analysis, because W t < W f-1 but 

i = W = Vll^tll 2 + lliV^IP. 

This in particular means that if the conditions are true in round t, then the second condition 
in Equation 3 continues to be true in round t+\, and only the first condition can fail. At this 
point we distinguish two cases. 
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Case 1. Suppose there is a round where the t ^ T, where the first condition fails to hold. 
Let t* be the smallest such round and put x = x r _i. By the previous argument, in this round 
we must have 



2 

l = N| 2 = ||JVx|| 2 + ||l^x|| 2 <I^J +\\Pux\\ 2 . (4) 
From this we conclude that ||P[/x|| > -y/l - (8A v /ya) 2 > 1 - 8A v /yo. Hence, 

iia* m + g,n > uapuxu - \\ gt w >{i - 8 -^y - a > (i - ^ - ^ > (i - 9 ^y 

Here we used that A v ^ A which is without loss of generality. Therefore, using Item 3, 

\\x' t ,\\>(l-^\a>(l-^)a 1 , 

This means that the algorithm terminates in round t* and outputs x?-\, which satisfies the 
conclusion of the lemma. 

Case 2. Suppose there is no round t ^ T, where Equation 3 fails. By our potential argument 
and the choice of T, this means that 

*<(i-?) T «b<=2tz^<^-= P (- y m) = s ^</i 

In particular, x+T satisfies ||Pyx T || ^ /S||P[/Xx|| < p. Thus, ||-P[/ x tII ^ aA ~ P 2 anc ^ ll^ x rll ^ 
(1 - j3)a. This show that x T satisfies the conclusion of the lemma. ■ 

The next corollary states a variant of Lemma 3.1 where we express all conditions in terms 
of <Ji(A) rather than o^{A). 

Corollary 3.2. Let a 6 (0, 1). Let Abe a matrix such that aj c+ i(A) ^ (1 -y/2)oi(A) for some k <n. 
Let U be the space spanned by the top k singular vector of A, let V be the space spanned by last n-k 
singular vectors. Further assume that there are numbers A,A U ,A V > such that the following 
conditions are met: 

1. For all t, \\g t \\ ^ A, \\P ugt \\ ^ A v and \\P v g t \\ ^ Ay. 



3. o x {A)> 



72fcA 



fiyd-yY 

Then, for T = 41og(a 1 (A)), the algorithm outputs a vector xelR" such that 

— — >(l-^)£r 1 (A). 
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Proof. We claim that there exists a k' < k such that o^>{A) < (1 - y/4/c)ai(A). Indeed, if this is 
not the case then 

k 

o k {A) -^)oi(A) > (l - 

i=l 

thus violating the assumption of the lemma. Moreover, k' satisfies ov(A) ^ (1 - ^cr^A). 
We will thus apply Lemma 3.1 to this k' setting y' = y/Ak. It is easy to verify that by our 
assumptions above, the conditions of Lemma 3.1 are satisfied. Hence, the output x of the 
algorithm satisfies 




> (1 - Y -)o k \A) > (1 -7/2)2(7! (A) > (1 - y)a 1 (A). 



Remark 3.3. We xtn'ZZ typically need k in Corollary 3.2 to be relatively small compared to n. We 
think of this as a mild assumption even when k and a are constant. In particular, it is implied by 
the assumption that A has a good low-rank approximation for small k. Indeed, if 0^+1 > (1 —a)o\, 
then the best rank k approximation to A has spectral error (1 - a)||A|| 2 . 

3.1 Privacy-Preserving Power Iteration 

We will next turn the robust power iteration algorithm from the previous section into a 
privacy-preserving version. The algorithm is outlined below. 

Input: Matrix A € W xn , number of iterations T € IN, privacy parameters e, b > 0, upper 
bound on coherence C > 0. 

1. Let a = 2£- 1 V 4T1 °g(i/(3)- 

2. Letx =g Q ~N(0,l/n) n . 

3. For t = 1 to T: 

(a) If Hxf-iH;^, > C/n, terminate and output "fail". 

(b) Let&~N(0,^) n 

(c) Let^=Ax f _!+g f 

(d) Putx, = 

Output: Vector x T e W 

Figure 2: Private power iteration (PPI) 
Lemma 3.4. The algorithm PPI satisfies (e, b)-differential privacy. 

Proof. By Theorem 2.6, the algorithm satisfies (e', (S)-differential privacy in each round. Here, 
e' was chosen small enough so that Theorem 2.4 implies (e, ^-differential privacy for the 
algorithm over all. ■ 
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The next lemma states the guarantees of the algorithm assuming that it successfully 
terminates. 



Lemma 3.5. Let a > 0. Let A be a matrix satisfying < (1 - y/T)o\ for some k > 1. Put 
T = 41og(cri(A)). Further assume that for some j3 ^ 0, A satisfies 

OrfcVClog(n)log(lA5) 
l|A|| 2 = — • (5) 

for some sufficiently large constant > 0. Assume that PPI terminates successfully and outputs Xj 
on input of A, T, and C. Then, with probability 9/10, 



\\Ax T \\>(l-p)\\A\\ 



2- 



Proof. Our goal is to apply Corollary 3.2. For this we need to verify that A and g t satisfy 
various assumptions of the lemma. Put A = ^jAC\og(n)a. With this choice of A, we have by 
basic Gaussian concentration bounds (see Lemma A.2): 

1. P{||g,||> A}< 1/n 2 . 



2- v\\\Pugt\\>$*}<Vn 2 . 

3- lp{||iVg t ||>7^ A } <1/n2 



Hence, with probability 1 - 1/n, none of these events occur for any t e [T). This verifies that 
the first assumption of Corollary 3.2 holds with high probability for this setting of A. Further 
note that, by Gaussian anti-concentration bounds (as stated in Lemma A.2) the following 
claims are true: 



1 



• v\\\PuXo\\>y[^L\> 98/100 

• v{\\Pvxo\\>^|Mi)> 98 / 100 



Hence, both of these events occur with probability 96/100. On the other hand the sec- 
ond condition of Corollary 3.2 requires that ||P[/*oll > 0(/cA Lr /ycx 1 (A)). Assuming the event 
||P u Xol| > Vfc/100« occurred this corresponds to a lower bound of the form o~\(A) ^ 0(kA/y) 
which is satisfied by Equation 9. The analogous argument applies to ||PyX ||. Finally, the 
third condition of Lemma 3.1 follows by comparison with Equation 9. Hence, the lemma 
follows. ■ 



4 Power Iteration and Incoherence 

We will next establish an important symmetry property of the algorithm. Specifically, we 
will show that for any of the eigenvectors u of A (assuming A is symmetric), the sign of 
the correlation between u and ny intermediate vector x t , i.e. sign((u,x t )) is unbiased and 
independent sign({v,x t )) for any other eigenvector v. This property is rather obvious in 
the noise-free case where x t is simply proportional to A l x . Hence, the sign of (u,x t ) is 
determined by the sign of {u,Xq). Intuitively, the property continues to hold in the noisy case, 
because the noise that we add is symmetric. 
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Lemma 4.1 (Sign Symmetry). Let Abe a symmetric matrix given in its eigendecomposition as 
A - YJi=\ a i u i u i Let t > and put X, = sign({ui, x t )) for i e [n]. T/ien (Xi,...,X„) is uniformly 
distributed in {-1,1}". 

Proof. We will establish by induction on t that the following two conditions hold for every 
t > : 

1. Yj(t) = (uj,x t ) is a symmetric random variable 

2. sign(Y;(f)) is independent of Yj(t) for all * z. 

Observe that these two conditions imply the statement of the lemma. In the base case notice 
that Y;(0) is just a random Gaussian variable N(0, 1/n) and hence symmetric. Now, let f ^ 1 
and consider 



Yi(t) 



(ui,Ax t _i+g t ) ai<M f ,x t _i> + <Mi,ft> a i Y i (*-l) + <M i ,ft> 



IIM-i+gtll 



IIM-i+gtll 



IIM-i+gtll 



Let Dj = o~jYj(t - 1) + (uj,g t ). Notice that D, is a symmetric random variable, since it is the 
sum of two independent symmetric random variable. Here we used the induction hypothesis 
on Y,-(f - 1). We can see that Y,(f) is a rescaling of a symmetric random variable, but we also 
need to show that the rescaling is independent of sign(Dj). Note that 



IIM-i+ftll 



^Uj(oi{Uj,Xt-l) + {Uj,g t j) 
;=1 



\ 



i=i 



This shows that the normalization term can be computed from D? and OjYj(t-l) + (uj,g t ) for 
j i. Note that each of these terms is independent of sign(Dj). Here we used the induction 
hypothesis on Yj(t - 1) and the fact that (uj,g t ) are independent Gaussians for all € [n]. We 



conclude that Y^t) = 



D; 



is a symmetric random variable. 



Vd 1 2 +-+d,? 

It remains to show that sign(Yi(t)) is independent of Yj(t), for all j * [i]. We have al- 
ready shown that the normalization term appearing in Yj(t) is statistically independent of 
sign(Y{(t)). Moreover, by induction hypothesis, the numerator OjYj(t - 1) + (uj,g t ) is statis- 
tically independent of sign(Yj(t - 1)) and statistically independent of (Uj,g t ). In particular, 
conditioning on any subset of the variables Yj(t),j * i leaves the two variables sign(Y{(t - 1)) 
and sign({ui, g t )) unbiased. This implies that no matter what the value of |Y;(f-l)| and \{ui,g t )\ 
is, the variable sign(Y{(t)) is unbiased. ■ 

We will use the previous lemma to bound the ^-norm of the intermediate vectors 
x t arising in power iteration in terms of the coherence of the input matrix. We need the 
following large deviation bound. 

Lemma 4.2. Let a.\,...,a n be scalars such that YJi=\ a f - 1 an & u i>---> u n are Mmf vectors in M n . 
Put B = max" =1 Hm/Hoo- Further let (s\,...,s n ) be chosen uniformly at random in {-1, 1}". Then, 



i=l 



> AB^J\ogn\ ^ l/n z 
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Proof. Let X - %i where X,- = s ; a,u ; -. We will bound the deviation of X in each entry 
and then take a union bound over all entries. Consider Z = YJl-X where Z, is the first 
entry of X,. The argument is identical for all other entries of X. We have EZ = and 
EZ 2 = Zi=i EZ f < b2 L"=i a] = B 2 . Hence, by Theorem A.3 (Chernoff bound), 

F{\Z\ > 4BVlo^)} < exp (- 16 ^° 2 g(W) ) < exp(-41og(n)) = 1 . 

The claim follows by taking a union bound over all n entries of X. ■ 

Lemma 4.3. Let A e 1R" X ". Suppose PPI is invoked on A, T < n, and C > 16«(A)log(n) and 
any choice of e,d> 0. Then, with probability 1 - l/n, the algorithm terminates successfully after 
round T. 

Proof. The only way for the algorithm to terminate prematurely in step t + 1 is if the vector 
x t satisfies Wx^^ ^ 4-y/w(A)log(n)/n. We will argue that this happens only with probability 
l/n 2 . Hence, by taking a union bound over all rounds T < n, we conclude that the algorithm 
must terminate with probability 1 - l/n. 

Indeed, let A - YJi=\ OiU{uJ be given in its eigendecomposition. Note that B - max" =1 ||m;||oo < 
-^]Ji{A)/n. On the other hand, we can write x t = H" =1 s^u; where a ; - are non-negative scalars 
such that YJi=\ a f = 1' an d £ { - L 1}- Notice that s,- = sign((x t ,Ui)). Hence, by Lemma 4.1, 
the signs (si,...,s n ) are distributed uniformly at random in {-1, 1}' ! . Hence, by Lemma 4.2, it 
follows that 

p{||x t |L>4BViog^}o/M 3 . 
Hene, a union bound over all t e [T] completes the proof. ■ 

Finally, we can combine Lemma 3.5 and Lemma 4.3 to conclude that private power 
iteration converges does not terminate prematurely and the output vector gives the desired 
error bound. 

Theorem 4.4. Let y, j5 > 0. Let A be a matrix satisfying o\ < (1 - ylT)a\ for some k > 1. Put 
T = 41og(<7i(A)). Further assume A satisfies 

&TkJu(A)\oe(l/d)\og(n) 
\\A\\ 2 = V ^ ' ^ ~ ■ (6) 

for some sufficiently large constant > 0. Then, with probability 8/1 0, on input of A, T, (e, 8) and 
C > 16w(A)log(n), the algorithm PPI outputs a vector x, such that 

^>(1-/»)||A|| 2 . 

Equivalently: 

114 I. ^ i a \ ©T/cVClog(n)log(l/<5) 

ey 

Proof. The proof follows directly by combining Lemma 3.5 applied with C = 16w(A)log(«) 
with Lemma 4.3. The latter lemma implies that for this setting the algorithm terminates with 
probability 1 - l/n. The former lemma implies that the stated error bound holds in this case 
with probability 9/10. Both event occur simultaneously with probability 9/10 - o(l). ■ 



14 



Remark 4.5 (On choosing T and C). As stated Theorem 4.4 requires the input to the algorithm to 
depend on two sensitive quantities, i.e., 0\{A) and fi(A). It is easy to get rid of this using standard 
techniques. We can upper bound o\(A) by \\A\\i = \Ajj\ which can be computed efficiently and 
privately (as it is l-sensitive). Since the dependence on a\{A) in the choice ofT is only logarithmic, 
this can change the error bounds only by constant factors. To get rid of fi(A), we can try all choices 
of C - 2', i € {0, 1,. . .,log(«)}. Since ^(A) < n, this process will eventually find a setting of C that 
gives the right upper bound up to an overestimate of at most a factor 2. As we need to scale down 
{e,S) by a log(«) factor in each execution, the error bounds deteriorate by a O(\og(n))-factor. This 
loss can be replaced by 0(loglog«) using the exponential mechanism [MT07]. We omit the details 
as they are standard. 

5 Rank k approximations and Deflation 

In this section, we show how to successively call our algorithm for obtaining rank 1 approx- 
imations to obtain a rank k approximation. To do this, we need to argue two things. First, 
we must argue that approximately optimal rank 1 approximations to successively 'deflated' 
versions of our original matrix can be combined to yield an approximately optimal rank 
k approximation. Second, we must argue that incoherence is propagated throughout the 
deflation process, so that we can in fact obtain good rank 1 approximations to the deflated 
matrices. 

Input: Matrix A e M nxn , target rank k, number of iterations T € IN, privacy parameters 
e, b > 0, upper bound on coherence C > 0. 

1. Let e' = £/(V4fcln(l/«5)), 6' = 6/k 

2. Let A <— A, B Q <- 0. 

3. For i = 1 to k: 

(a) Letv i <-PPI(A i _ 1 ,T,e / ,6',C) 

(b) Let3v = ||A i _ 1 v / || 2 + Lap(l/£ / ) 

(c) Let A; <- A ; _! - olvivj , Bj <- + o^vf. 

Output: Matrix A k 

Figure 3: Rank k approximation (rank-k). 

Our analysis will be based on a useful lemma of Kapralov and Talwar, that shows that the 
standard "matrix deflation" method can be applied even given only approximate eigenvectors. 
The lemma here is actually an easy modification of the lemma from [KT13]. The details can 
be found in Appendix B 

Lemma 5.1 (Deflation Lemma [KT13]). Let Abe a symmetric matrix with eigenvalues Ai > . .. > 
A n . There exists a constant C > so that the following holds. Let x be any unit vector such that 
\\Ax\\ > (1 - a/C)X r , where a e (0, 1). Let A' - A-tv ■ v T , where t e (1 + a/C)||Ax||. Denote the 
eigenvalues of A' by X[ > . . . ^ X' n . 
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1. Afc < Al_, ^ min(Afc_i, A& + aX\)for each k e {l,...,n}. 

We now argue that deflation preserves incoherence. Here, we make use of two lemmas 
from [HR12]. 

Definition 5.2 (^-coherence). Let U be an n x r matrix with orthonormal columns and r < n. 
Recall, that Py = UU T . The ^-coherence of U is defined as 

MU) = - max \\P u ej\\ 2 = - max \\U {j) \\ 2 . (7) 

Here, ey denotes the ;'-th «-dimensional standard basis vector and denotes the j-th row of 
U. The HQ-coherence of an n x n matrix A of rank r given in its singular value decomposition 
UJLV T where U e R nxr is defined as fi (U). 

Observe that we always have ^q(A) ^ ji(A). 

Lemma 5.3 ([HR12]). Let U\,...,u r e M" be orthonormal vectors. Pick unit vectors n\,...,ni 6 
gn-i un ij 0Ym \y a f random. Assume that 

n > c k(r + fc)log(r + k) (8) 

where Cq is a sufficiently large constant. Then, there exists a set of orthonormal vectors V\,..., v r+ ^ e 
M n such that span{vi,...,v r+ /t} = span{wi,...,w r ,ni,...,nfc} and furthermore, with probability 
99/100, 

Mi v i \-"\v r +k])<2fi {[u 1 \ "-\u k ]) + 

Lemma 5.4 ([HR12]). Let U be an orthonormal nxr matrix. Suppose w e range(Lf) and \\w\\ - 1. 
Then, 

\\M\L <:--M U )- 
n 

Lemma 5.5. Let A e ]R" X,! be a matrix. Define a set of vectors Si,...,s\ and matrices A[, ■■■,A' k 
as follows. Let A' Q - A. For each i, s, = A'-Aj + CjUj, where t{ e IR" is an arbitrary vector, C\ is 
an arbitrary real number, and e S"^ 1 is selected uniformly at random. Let A- = A':_, -djSjsf, 
where dj is an arbitrary real number. Then for all i: 

fi(A'A < 2r^(A) + 0(ilogn) 

Proof. We write A - YJj=i a j u j v J ■ The proof will follow easily from Lemma 5.3 and the 
following claim. 

Claim 5.6. For i e {0, . . . , k}, let W\, . . . , w r+ { denote the left singular vectors of A •. Then, W\,..., w r+ i 6 
span(u 1 ,...,u r ,n 1 ,...,n i ). 

Proof. We prove this by induction. The claim is immediate when i - 0, which forms the 
base case. For the inductive case, consider A- = A- t - djSjsf . Write the singular value 
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decomposition of A- _ l as: A- _ l - YJjVi 1 °']V] Z \ > an d write the singular value decomposition 
of A- as: A- = YJjt'i ^jWjxj . We can also write A- = Ly=i _1 a 'jVj z ] ~ ^i s i s J ■ Therefore, for all 



r+i-l 



\ j w j =A' i Xj = (o' e {xj,z e ))y e -(d i {x j ,s i ))s i 



Therefore, Wj e span(yi,...,jv +! _i,Sj). But s, = A-^f,- + c,« ; -, so s ; - 6 span(y 1 ,...,y r+; _ 1 ,n,), and 
by our inductive assumption, y\,---,y r +i-\ 6 span(u 1 ,...,u r ,n 1 ,...,n i _ l ). Therefore, we can 
conclude that e span(u 1 , ,„,u r ,n\, ...,«,) for all ;'. ■ 

By Lemma 5.3, we can conclude that for all j, Wj € span(v[,...,v' r+j ) such that v[,...,v' r+i 
are orthonormal with: 

Mv[ I I < +f ) ^ 2^ (A) + O | ^ j < MA) + ( i log" 

Therefore, we have: 

^(A-) = n- max ||tf,||^ < r^oK I ••• I v' r+i ) < 2r^(A) + O(i'logn) 
je[r+i] 

where the first inequality follows from Lemma 5.4. ■ 

We are now ready to state our results for obtaining good rank k approximations in the 
spectral norm. First, we translate our bounds from Section 4 into a statement about rank-1 
matrix approximation. 

Theorem 5.7. Let y, j5 > 0. Let A be a matrix satisfying a c < (1 - y/2)o\ for some c ^ 1. Put 
T = 41og(cri(A)). Further assume A satisfies 

_ OrcVMA)log(l/5)log(n) 
l|A " 2 - ^ • (9) 

for some sufficiently large constant > 0. Then, with probability 7/1 0, on input of A, T, (e, 6) and 
C ^ 9^(A)log(n), the algorithm rank-k(A, T, e, b, 1) outputs a rank 1 matrix Aj such that: 

||A-A 1 || 2 <cr 2 (A) + /3a 1 (A). 

Proof. This follows directly from Theorem 4.4, together with Corollary B.4, and the observa- 
tion that: 

IP W\ - oi | > c • jgff! } = exp (-e'/Joi ) = O (fT cT VJ^X ) = o(l) 

Therefore, with probability at least 7/10, both of the hypotheses of Corollary B.4 are satisfied. 

■ 

Our rank k approximation result follows similarly, but we lose a factor of yjr, where r is 
the rank of the initial matrix to be approximated, due to the potential degradation in matrix 
coherence during the deflation process. It is not clear whether this factor of r is necessary, or 
whether it is an artifact of our analysis. 
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Theorem 5.8. Let y, f> > 0. Fix a Let Abe a rank r matrix such that there exist indices C\,...,c k 
such that for each i; o~ c . < (1 - y/2)(aj - (i - l)/3oi). Put T = 41og(<7i(A)). Further assume A 
satisfies 

@Tc k J(ru{A) + yfk\ogn)\og(k/3)\og(n) 

cr i {A)> y - . (10) 

eyfi 

for each i e [k]. for some sufficiently large constant © > 0. Then, with probability 7/10, on input 
of A, T, (e,5) and C > 9^(A)log(n), the algorithm rank-k(A,k,T,e,d) outputs a rank k matrix A k 
such that: 

\\A-A k \\ 2 ^a k+1 (A) + kpa l (A). 

Proof. This follows directly from Theorem 4.4, together with Corollary B.4, and our bound 
on the degradation of the coherence of A under deflation, Lemma 5.5 ■ 



6 Lower Bound 

In this section, we prove a lower bound showing that our dependence on the coherence 
\i is tight. For every value of ji e [2, n], there is a family of nxn matrices such that no 
e-differentially private algorithm A can compute a vector A(M) = v with the guarantee that 

\\Av\\>o x -o(^. 

Theorem 6.1. For every value of C e [2, n], there is a family of nxn matrices Mc such that: 

1 . For every M e Mq, h(M) - C 

2. For any 6 = Q(l/«), no (e,d)- differentially private algorithm A : IR" xn — » R" has the 
guarantee that for every M e Mq, with constant probability, A(M) = v such that ||Mv||/||v|| > 

ai(M )-o(^). 

Remark 6.2. Note that this theorem shows that our upper bound for computing rank 1 approx- 
imations to incoherent matrices is tight along the entire curve of values \i, up to logarithmic 
factors. 

Proof. For each C € [2, n], we define our family of matrices Mq as follows. Let T> c 1R" be the 
set of boolean valued vectors with exactly n/2 non-zero entries: V - {D e {0,1}" : \\D\\ - n/2). 
We will intuitively think of each D e V as a private bit-valued database, whose entries we 
are protecting with a guarantee of differential privacy. For each D e V, let D = D/\\D\\ 2 be 
the rescaling of D to a unit vector. Note that D e {0, V2/V"}"- Define s(C) = n/C, and u € M d 
to be the vector such that u,- = l/ys(C) for i e {l,...,s(C)} and w ; = for i > s(C). Finally, we 
define our class of matrices Mq to be: 

M c = {M D : M D = ( V«s(C)/2) u ■ D T : D e v] 

Note that each M D e Mis a matrix in which the first s(C) rows are identical copies of the 
database D e {0, 1}", and the remaining n-s(C) rows are the zero vector. Moreover, for each 
M 6 Mc, we have 

crm = [^P] = i^=l and H (M) = n-mJ * *) = " = C 
n ' ( V2 J \V2C/ ^ ' \s(C)'n) s(C) 
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Now consider any unit vector v. For each M D e Mq, we have: 



||M D v|| 2 = 



5(C) 



(11) 



Therefore, any unit vector v such that ||M D v|| 2 > ^iti(Md) must be such that (D,v) > fHo 
However, if we view D as being a private database, it is clear that it is not possible to privately 
approximate it well: 

Lemma 6.3. For b ^ 1/5, Let B : 1R" — » 1R be a (1, b)- differentially private algorithm with respect 
to the entries of its input vector. Let D eVbe chosen uniformly at random. Then with probability 
> 1/2 B(D) = v such that (D, v/\\v\\ 2 ) ^ 1 - j^. 

Proof. Let D e V be a randomly chosen database D elR" with ||D|| = n/2 entries. Let 
D - D/\\D\\ be its normalization to a unit vector. Suppose that v e JR" is a unit vector such 
that (D, v) > 1 - a. We may therefore write: 



Let D* denote the vector that results from setting D* - 1 in each coordinate i in which 
> 1/2, and setting D* - in all other coordinates. Since \\*JjV ^ ^(5^Ja), it 

follows that \\D* -D\\ ^ ^(6^Ja). Now consider the probability that a randomly chosen index 
i 6 {i : D, > 0} is such that D* > 0. This occurs with probability at least 1 - (6V«)- On the 
other hand, consider the probability that a randomly chosen index e {i : D; = 0} is such that 
D* ^ 0. This occurs with probability at most {6^[a) Note also that because (over the random 
choice of D), each index i e D is set to 1 uniformly at random, i and are drawn from the 
same marginal distribution. Finally, consider the neighboring database D' - D - {/} + {;'}, 
and note that D' is also uniformly distributed among the set of databases V. We therefore 
have that by differential privacy: (1 - (6V«)) < e • (6V«) + b.li a < and b < 1/5, this is a 
contradiction. 




where D 1 - is some unit vector orthogonal to D. We therefore have: 



II 
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It remains to observe that changing a single entry of D results in changing s(C) = n/C 
entries of M D . Therefore, by the composition properties of differential privacy any algorithm 
A : jR ,!X " which is (e, <5)-differentially private with respect to entry changes in its input is 
((n/C)e, (n/C)($)-differentially private with respect to entry changes in D when given M D as 
input. Therefore, Lemma 6.3 taken together with equation 11 implies that if e < C/n and 
6 < (C/(5n)), then no (e, ^-differentially private algorithm, when given as input a uniformly 
randomly chosen matrix M D e Mq can with probability greater than 1/2 return a vector 
A(M D ) = v such that 

||M D v|| 2 > oA\ - 1/1000) = ox -I I 

U000V2C/ 

Finally for point of contradiction, suppose that there was an e-differentially private 
algorithm that for every matrix M, with probability greater than 1/2 returned a vector 

A(M) = v such that ||Mv|| ^ Oi{M)-oy^\. Letting e - C/n, and letting M - M D be chosen 

from Mq, we would have that: 



||Mv|| 2 > o l {M)-o 




which is a direct contradiction. This completes the proof. 



7 Conclusions and Open Problems 

We have shown nearly optimal data dependent bounds for privately computing the top 
singular vector of a matrix, in terms of its ^-coherence. We conclude with several specific 
open problems, as well as a general research agenda. 

Specifically, it would be nice to resolve the following technical questions: 

1. We have shown that our dependence on ^-coherence is tight, but it remains possible 
that there might be a weaker notion of coherence that this or other algorithms could 
take advantage of. One candidate is ^-coherence, which only bounds the magnitude 
of the entries in the top k singular vectors, and leaves the others unconstrained. We 
do not know how to show that ^-coherence is sufficient to bound the quality of the 
approximation to the top singular vector. However, as evidence of this conjecture, 
in Appendix C, we show that the local sensitivity of the powering operation can be 
bounded in terms of fi^ coherence. 

2. When we "deflate" A so as to recurse and compute an approximation to the higher 
singular vectors, we lose a 1-time factor of y/r in the coherence, where r is the rank 
of the matrix. As a result, our bounds for rank k approximation for k > 2 have a 
dependence on the matrix rank. Can this factor of -y/r be removed, or is it inherent? 

More generally, this paper is an instance of a broader research agenda: overcoming worst- 
case lower bounds in differential privacy by giving data-dependent accuracy bounds. In 
many settings (especially if the data set is small), the worst case bounds necessary to achieve 
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differential privacy can be prohibitive. However, natural data sets tend to have structural 
properties (like low coherence) that can potentially be taken advantage of in a variety of 
settings. It would be interesting to understand the relevant features of the data that allow 
more accurate private analyses in domains other than spectral analysis. 
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A Deviation bounds 

Lemma A.l (Gaussian Anti-Concentration). Let ~ N(0, 1) and let a^ > for 1 < z < n. Then, 
for every y>0, 



P 



V i=l i=l I 

We thank George Lowther for pointing out the following proof. 

Proof. We may assume without loss of generality that YJl= \ a i = 1 an d y < 1. For every A > 0, 
we have 



P 



f^a^f < y f^a t 1 < Ee^-^' = e*? f]lEe a ^ = e x ? f](l + 2\a i y m 

V 1 = 1 1=1 j 1 = 1 t=l 

^e A >'(l + 2Ar 1/2 . 

The claim follows by setting A = (y^ 1 - l)/2. 

The following direct consequence was needed earlier. 
Lemma A.2. Let U be a k-dimensional subspace ofM" and let g ~ N(0, \) n . Then, 

1. for every y > 0, P{||P[/g|| ^ ^jyk) ^ ^ey . 

2. for every t ^ 1, we have ^\\\Pug\\ > Vw < exp(-f). 



Proof. The first claim follows directly by Lemma A.l. The second can be verified by direct 
computation. ■ 

Theorem A.3 (Chernoff bound). Let the random variables Xi,...,X m be independent random 
variables. Let X = YjLi Xj and let a 2 = VX. Then, for any t>0, 



P{|X-EX| > t) <exp 



"4a 2 
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B Proofs from Section 5 



Lemma B.l (Deflation Lemma [KT13]). Let A be a symmetric matrix with eigenvalues X\ > 
. . . ^ X n . There exists a constant C > so that the following holds. Let x be any unit vector such 
that x T Ax ^ (1 - a/C)\\, where a e (0, 1). Let A' = A-tv • v T , where t e (1 + a/C)x T Ax. Denote 
the eigenvalues of A' by X[ > . . . ^ X' n . 

1. Ajt < X' k _ x < min(Ayt_i, A/t + aX\)for each ke{l,...,n}. 

Because our algorithm returns a vector v with a guarantee on the quantity \\Av\\, rather 
than on the Rayleigh Quotient v T Av, we must relate these two quantities, which we do 
presently. 

Lemma B.2. For any unit vector x, |x r Ax| ^ ||Ax|| 

Proof. By the Cauchy-Schwarz inequality |x T Ax| = |(x,Ax)| < ||x|| • ||Ax|| = ||Ax||. ■ 

We now prove a partial converse, for vectors x such that ||Ax|| is large. 
Lemma B.3. For any ^ a ^ 1/4 and for any unit vector x such that ||Ax|| > (1 - a)X\. 

x T Ax > (1 -5a)X x 

Proof. Let V\,...,v n denote the eigenvectors of A in order from largest to smallest eigenvalue: 
| A 1 1 ^ ... ^ |A„|. Then we may write x = H" =1 «j • Vj. Likewise Ax = H" =1 a,A ; • v,-, where 

YJi=i a f - 1/ since x is a unit vector. Hence, ||Ax|| = -^Il" =1 oifXf and x r Ax = (x,Ax) = 
We define: 

i* = maxjl ^i^n: |A;| > Ai(l -4a)} 

to be the largest index such that the i*'th eigenvalue has magnitude at least (1 -Aa)X 1 . Now 
define the quantities: 



Sj ± ^a 2 S 2 = 

; = 1 

and note that Sj - 1 - S\. We can calculate: 



2 

i 

; = 1 !=i*+l 



(l-a)A! <S 



\ ! = 1 

< ^S 1 X\ + S 2 {l-^a)X\ 

^ A 1 (VsT+V 1 -SiVl-4a) 

^ A 1 (VsT+V 1 -Si(l-2a)) 



Solving for Si, we find: 



1 - 4a + 8a 2 - 10a 3 + 6a 4 + -1 + a -1 + 2a)Jl + a -2 + 3a , 

Si > '—^ i L ^ l -4a 2 . 

1 2(1 + 2 -l + a)a) 2 



23 



Therefore, we also have S 2 < 4a 2 . 



Finally, we may calculate: 



n 



x T Ax = J^a 2 Xj >Si(l-4a)A 1 -S 2 (l-4a)Ai > (1 -8a 2 )(l -4a)A 1 >(l-5a)Ai 



i=i 



where the last inequality holds since a ^ 1/4 



As a corollary we get a modified version of the deflation lemma of [KT13] 

Corollary B.4 (Modified Deflation Lemma [KT13]). Let A be a symmetric matrix with eigen- 
values Ai > . . . > X n . There exists a constant C > so that the following holds. Let x be any unit 
vector such that \\Ax\\ > (1 -alC)X\, where a e (0, 1). Let A' = A-tv -v T , where t e (1 ±a/C)||Ax||. 
Denote the eigenvalues of A' by X\ > . . . > X' n . 

1. Ajt < Ai 1 < min(Ayt_i, X^ + aX\)for each k e {l,...,n}. 

C Perturbation bounds for matrix powers 

In this section we prove a perturbation bound for matrix powers. The result be seen as 
bounding the so-called local sensitivity of power iteration. Notably we can use following 
notion weaker form of ^-coherence that depends only on the top few singular singular 
vectors. 

Definition C.l. Let M be a real valued m x n matrix with singular value decomposition 
M = YJi=\ 0{U\vJ , where o\ > 02 > ••• > o n ^ 0. We define the top-k coherence of M as 



Note that 1 ^ ^t(M) «S maxjm, «}. 

Theorem C.2. Let q > 1 be a number. Let M be a real valued nxn matrix with singular value 
decomposition M - H" =1 OjUjvJ, where 0\ > a 2 > ... > <7„ > 0. Assume that Oy > 4q, crj. +1 ^ aj/2 
and q > log n + 1. T/ze«, with g ~ N(0, l) n , 



Remark C.3. We note that the above bound could easily be turned into a high probability guaran- 
tee. 

The next lemma will be helpful in simplifying some expressions arising in the proof of 
the theorem. 



K { 1 

Hl(M) = maxmaxjmllu,'!^, n||v ; - 



! = 1 



I Moo 



2 1 
ool * 




Lemma C.4. Let E - e s ej and a > 1, Then, 

- E 2 = E if s = t and E 2 = otherwise. 

- EM a E = C a E, where C a = £*=l a f( u i' e s)(vue t ). 



24 



Proof. Both claims are immediate. 

In the following we will use C = ^fc(M) as a shorthand. 

Lemma C.5. Let b = min . Then, C a ^S of + jzr . 

Proof. Appealing to the definition of C a from Lemma C.4, we have 

n k n 



i=l 



C a = 2^of{ui,e s ){ui,e t ) = ^o° i {u i ,e s ){v i ,e t )+ ^{ui,e s ){vi,e t ) 

i = l ;=/c+l 



i=k+l 



Lemma 



C.6. Recall that b = min . We /lave, 



£ a l 

i 2 / j 



E||M«£MM|<l6 ( Tf + ^ 

Proof. First note that 

E||M a EM^|| 2 = ||M a e s || 2 E(e t T M^g> 2 = ||M a e s || 2 ||e t T M^|| 2 , 
where we used that g ~ N(0,l) n . Hence, by Jensen's inequality, 

E||M a £M^|| ^ ^Jm\\M a EMPg\\ 2 = ||M"e s || • ||e t T M^||. 
It remains to bound the right hand side of the previous inequality. Indeed, 



it- 

\\M a e s \\ = , J^o? a (u t ,e s ) 




n 1 2 a \ i— 

i i=i 



We can bound ||e f T M^|| with the same reasoning. 

Lemma C.7. Let A = M a ^EM a ^E---EM a ^EM a c with a { > 0. T/zen, 



Proof. First we apply Lemma C.4 to all intermediate terms EM a, E where i € {2,...,€ - 1}. 
Then we apply Lemma C.5 and Lemma C.6 to the remaining term. Noting that <5 2 ^ 6, since 
^ 6 < 1, we have established that 



1 = 1 
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On the other hand, it is not hard to see that the following inequality holds, 



1=1 

■ 

We are now ready to prove Theorem C.2. 

Proof of Theorem C.2. Observe that the matrix (M + £) 1? -M <? equals the sum of 2^ — 1 matrices 
that are either zero or of the form A - M ai EM a2 E- ■■EM ai - x EM 0Ct as described in Lemma C.7. 
Let us say that H, =1 «j is the "order" of the matrix A. Clearly the order of the matrix A is at 
most q-€+l. Furthermore, there are at most (^) matrices of order z. 

Using the fact that (\) < q^~ z and the assumption that q < Oy/A, we will apply Lemma C.7 
to each such matrix and sum over the resulting error terms: 

£(:W(^«<'+«)^*)<*&^r , «f(^«^ 1 ) 

Z=0 ' 2=0 

^ q-1 P-( 1 46 \ 

z=0 

< 9Sqaf 1 . 

In the last step we used that b ^ VlA? and the assumption that q > log(«) + 1. 

The theorem follows now straightforwardly By the previous argument and linearity of 
expectation, we have 

\\((M + Ef -Ml)g\\ ^ 96qa^ 1 = 9min J 1, 
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